Nowadays, video Semantic Segmentation is used in many applications such as automatic driving, navigation systems, virtual reality systems, etc. In recent years, significant progress has been observed in Semantic Segmentation of images. Since the consecutive frames of a video must be processed with high speed, low latency, and in real time, using Semantic image Segmentation methods on individual video frames is not efficient. Therefore, Semantic Segmentation of video frames in real time and with appropriate accuracy is a challenging topic. In order to encounter the mentioned challenge, a video Semantic Segmentation framework has been introduced. In this method, the previous frames Semantic Segmentation has been used to increase speed and accuracy. For this manner we use the optical flow (change of continuous frames) and a GRU deep neural network called ConvGRU. One of the GRU input is estimation of current frames Semantic Segmentation (resulting from a pre-trained convolutional neural network), and the other one is warping of previous frames Semantic Segmentation along the optical flow. The proposed method has competitive results on accuracy and speed. This method achieves good performances on two challenging video Semantic Segmentation datasets, particularly 83. 1% mIoU on Cityscapes and 79. 8% mIoU on CamVid dataset. Meanwhile, in the proposed method, the Semantic Segmentation speed using a Tesla P4 GPU on the Cityscapes and Camvid datasets has reached 34 and 36. 3 fps, respectively.